Minimally Supervised Multilingual Taxonomy and Translation Lexicon Induction
نویسندگان
چکیده
We present a novel algorithm for the acquisition of multilingual lexical taxonomies (including hyponymy/hypernymy, meronymy and taxonomic cousinhood), from monolingual corpora with minimal supervision in the form of seed exemplars using discriminative learning across the major WordNet semantic relationships. This capability is also extended robustly and effectively to a second language (Hindi) via cross-language projection of the various seed exemplars. We also present a novel model of translation dictionary induction via multilingual transitive models of hypernymy and hyponymy, using these induced taxonomies. Candidate lexical translation probabilities are based on the probability that their induced hyponyms and/or hypernyms are translations of one another. We evaluate all of the above models on English and Hindi.
منابع مشابه
Minimally supervised techniques for bilingual lexicon extraction
Normally, word translations are extracted from non-parallel, bilingual corpora, and initial bilingual lexicon, i.e., a list of known translations, is typically used to aid the learning process. This thesis highlights the study of a series of novel techniques that utilized scarce resources. To make the study even more challenging, only minimal use of resources was allowed and important major lin...
متن کاملImproving Translation Lexicon Induction from Monolingual Corpora via Dependency Contexts and Part-of-Speech Equivalences
This paper presents novel improvements to the induction of translation lexicons from monolingual corpora using multilingual dependency parses. We introduce a dependency-based context model that incorporates long-range dependencies, variable context sizes, and reordering. It provides a 16% relative improvement over the baseline approach that uses a fixed context window of adjacent words. Its Top...
متن کاملMultilingual Lexicon, Translation and Generation for Formal Proofs A Grammar based approach
This three year PhD thesis project is conducted under the collaboration of the Logic group at Mathematics Laboratory (LAMA), Université de Savoie, France and the Language Technology group at department of Computer Science, Chalmers University of Technology, Sweden. It is co-supervised by Christophe Raffalli and Aarne Ranta. In this project, we intend to develop tools and recources capable of tr...
متن کاملSupervised Bilingual Lexicon Induction with Multiple Monolingual Signals
Prior research into learning translations from source and target language monolingual texts has treated the task as an unsupervised learning problem. Although many techniques take advantage of a seed bilingual lexicon, this work is the first to use that data for supervised learning to combine a diverse set of signals derived from a pair of monolingual corpora into a single discriminative model....
متن کاملSymbiosis between a Multilingual Lexicon and Translation Example Banks
We propose a symbiotic framework in which correspondences between electronic multilingual lexicons and translation example banks can be captured, so that their functions and contents may benefit and improve upon one another. Several mechanisms are used for this purpose: i.) two flexible annotation schemas, S-SSTC and SSTC+L, for supporting irregular multi-level correspondences across languages;...
متن کامل